Project Title Description Hardware Software Electrical component themed AI detection and identification. A mounted camera above a surface (part of the product) Produces a controlled environment live feed for the application An application running inference on a live USB camera feed (optionally imported picture or video) Application Modification of the provided data to simulate differences in the environment and to provide imperfections to train against Augmentation examples Addition of glare Rotation Blurring Addition of spots GUI The Application is based on Qt Creator, using C++ Inference Running on C++ Utilising Ultralytics YOLOv8 Summary Detection via Inference Detect and display boundaries for each identified class from the input image using Inference. Identifcation Post-processing of the components in the bounding boxes detected by inference, which may have additional information that can be identified by a variety of approaches. Examples LEDs Resistors Resistor code value LED color Technology AI based Electrical Component Identifier IC Components Pin count Information written on the component Features Inference Classes to train the model to detect: Resistor Diode Capacitor LEDs Integrated Circuits AC DC LDR Milestones Base camera rig Initial inference model training Inference running Testing with video footage from a mobile device Research Models Ultralytics YOLO Live Labeling Focus Audience Set Rig The set position of the camera, a significant reduction in distance between the objects, and significant consistency of the lighting provided by the ring light, and the static background - will boost the confidence of the inference considerably. Training Running Post-processing Rationale Timeline Gantt Chart Live training YOLOv5 Training 1st batch, test run Image Count 100 Classes (1 Total) Resistor 2nd batch Image Count Training Testing 20 Training Testing 1800 540 3rd batch Image Count Training Testing 2393 724 Classes (9 Total) red_led green_led blue_led yellow_led ac_capacitor dc_capacitor resistor sip_resistor pcb_terminal Classes (10 Total) red_led green_led blue_led yellow_led ac_capacitor dc_capacitor resistor sip_resistor pcb_terminal metal_nut Augmentation Default Augmentation Default Average time per epoch 34 seconds Epoch count 400 Epoch count 250 Epoch count 300 Augmentation Default YOLOv8 Due to the angle and lighting both being known and mostly set thanks to using a set rig, the input dataset does not need to cover angles and lighting outside what the rig will expose it to during runtime. The sum of all the points covered above results in a significant reduction in data required to train when compared to a setup without a set rig, for equivelant confidence values during runtime. The angle range is reduced only to looking from top to down, eliminating the rest of the angle range. While the lighting will change depending on the room conditions, the ring light around the camera will provide significant consistency in lighting. While this does not not eliminate the necessity to train against various lighting conditions, it does reduce their significance and increase certainty of the detection. Only the components being detected need to be trained in all angles, as opposed to the camera gathering the dataset requiring to be positioned in different angles. Having a top to down view also eliminates the majority of issues that come with glare from high luminosity bodies, such as clouds or the sun. A set rig significantly limits the distance that the objects will be from the camera during runtime, allowing for further confidence in the predictions. Static background Angle range Lighting Apart from dust or unexpected objects present on the rig's surface, which should be removed before usage - the background that the objects are in front of will stay mostly consistent. This reduces the necessity to gather data of the same object under backgrounds that are not expected to be used during runtime. While this project may be retrained and refocused to be utilised for many different fields - it is trained for electrical component identification, which is focused towards engineers. Architectures This project focuses on both existing engineers, and ones that are interested in becoming engineers. Having access to the provided by the project quick identification of components, count of each, and any potential additional information saves time spent manually analysing this information. Average time per epoch 2 minutes Average time per epoch 2 minutes and 20 seconds SIP Resistor Singular Acronyms SIP Introduction Single Inline Package GPU Graphics Processing Unit CPU Central Processing Unit AI Artificial Intelligence LDR Light Dependant Resistor LED Light Emitting Diode AC DC Alternating Current Direct Current PCB terminal PCB Printed Circuit Board The most prominent color may be identified by sorting all the colors from the image into their hue values, and checking which hue is most active. The color codes can be identified by processing the image using filters and otherwise until only the prominent colors remain. These can be processed into the actual ohm value. Then, the positions of the color codes relative to the body of the resistor can be used to identify the specific positions and order of the color codes. The pin count can be identified by processing the image using filters until there is a clear contrast between the body of the chip, and the pins. One approach that could help identify the number of pins would be drawing a line between two of the pins and seeing how many of the pins touch this line. Taking the line that touches the most pins would provide the pin count of this IC. OCR may be used OCR Optical Character Recognition Software based reading alphabetical characters from an image that contains written text. Input image Inference method Algorithm method Different color LEDs may be trained as individual classes. Has the disadvantage of requiring training for each individual LED separately, as opposed to one generic LED. Has the advantage of working on any LED. Raw input High contrast filter Colors histogram Approaches after filtering Clearly prominent yellow Has the disadvantage of potentially giving false information if the background is too vibrant. Contrast approach HSV Taking the average of all the pixels hue values that have a value above a certain threshold. Around 0.7 on a range from 0 to 1 should be appropriate. Hue is in the range of 0 to 360 degrees. The pink dots represent the pixel values obtained from the previous step. Taking the average of this data, the result would land in the degree value that can be easily determined as yellow, by separating the hue circle into sections of colors by degrees ranges. HSV, or Hue Saturation Value, is an alternative way to represent colors. It can be advantageous over RGB in situations such as this. RGB Red Green Blue Commonly used to referred to a way of defining colors by their Red, Green and Blue properties. HSV Hue Saturation Value Commonly used to referred to a way of defining colors by their Hue, Saturation and Value properties. Yellow is between 72° and 108° degrees on the hue circle. Note: This example would ignore colors that are darker than 0.7, on a range of 0 to 1. The ability to take a snapshot of the current frame, defining appropriate labels, and saving this labeled snapshot for future training. All from inside the GUI. Alternatively, taking snapshots of the GUI and saving them for later labeling. Sorted from highest priority, to lowest. Setting up the camera on a rig Base GUI GUI with essentials to interface the camera through USB, with A live display from the camera on the rig. Ability to take images by pressing a button. Support for running Inference. ~100 images of a single class, taken from the rig for initial training and testing of the model. Initial dataset gathering For the purpose of testing inference on the rig. Proof of concept. The results will not be perfect as the dataset is minimal, and only contains 1 class. Further dataset gathering At least 250 pictures of each class of every component that the project is designed to detect. Furter model training This training will take considerably longer than the initial training. Around 2 minutes per epoch, and should be ran for at least 300 epochs. The initial training should not take long at all, and does not require to be polished. Training for ~100 epochs should be sufficient, with each epoch taking ~20 seconds on the machine available. Rig Model Training via Deep Learning Machine used Personal Computer CPU GPU AMD RyzenTM 7 5800X3D Core count 8 Base clock frequency 3.4GHz L3 Cache 96MB Maximum operating temperature 90°C Thread count 16 GeForce RTX 3060 Ti Memory 8192MB CUDA core count 4864 Capacity Type GDDR6X Base clock frequency 1.41GHz The goal is to reach 0.8 from range of 0 to 1 confidence values. Ability to gather further information from the detection bounding boxes provided by the inference. After the previous steps are in good shape, investigation of moving the inference to a mobile device will begin. If the confidence values are not up to standard, more data will be gathered from this and potentially other mobile devices, and further training will follow, until the results are adequate. If adequate results are achieved before the deadline of this project, deployment to a mobile device will be started. If the frame rates are not sufficient enough, the inference may be ran on still images to improve user experience. Optional: Ability to label the images from the device, without requiring external software. It may be advantageous given the timeframe of the project to instead gather data during a session and labelling it afterwards. Memory Capacity Type 2x16GB DDR4 Frequency 3.6GHz Brand Corsair Name Vengeance RGB PRO SL Link https://www.corsair.com/eu/en/Categories/Products/Memory/Vengeance-RGB-PRO- SL-Black/p/CMH32GX4M2E3200C16 Brand AMD Name Ryzen 7 58700X3D Link https://www.amd.com/en/products/cpu/amd-ryzen-7-5800x3d Brand NVIDIA Series 30 Name RTX 3060Ti Link https://www.nvidia.com/en-gb/geforce/graphics-cards/30-series/rtx-3060-3060ti/ CUDA Special cores that are designed for compute-intensive tasks. These run parallel with the CPU, and may also run parallel with multiple GPUs. They are perfect for deep learning, as deep learning is incredibly compute intensive. Deep learning training times are predictable, and stay mostly constant between epochs. This means that there are no race conditions, and the more processing power available, the quicker the epoch will finish. Each of these steps should be polished before continuing to the next one, to provide a solid foundation for the next step to be based on. Analysis Brief History YOLO, which stands for You Only Look Once is a popular image segmentation and object detection model that was originally developed by Joseph Redmon and Ali Farhadi. The first version was released in 2015, and it very quickly became popular due to the significantly superior speed and accuracy when compared to other architectures. YOLOv1 YOLOv4 Released in 2018, Introducting of Mosaic data augmentation, and a new and improved loss function - decreasing time taken to achieve better results for the trained model. YOLOv5 Released in 2020, Introducing support for Object Tracking - which allows following a moving object, and Panoptic Segmentation, which allows identification of overlapping objects, with accurate bounding boxes. Ultralytics YOLOv8 The latest version of YOLO as of today. YOLOv8 is a state-of-the-art model that builds upon the already very successful previous YOLO versions, introducing new performance and flexibility features. Full support for previous YOLO versions, making it incredibly convenient for existing users of previous YOLO versions to take advantage of the new features. Versions Comparison In general, YOLOv8 is superior to all of its predecessors. While YOLOv5 is mostly underperforming when compared to the next versions, it is important to note how incredibly minimal the delays are even on a version so outdated now. YOLO offers pretrained models that are used to start train custom models. Each model has its advantages and disadvantages, and should be picked depending on the project. Size mAP single-model single-scale values while detecting on the COCO val2017 dataset. Speed Averaged time taken using the Amazon EC2 P4d instance on the COCO dataset. The pixel height and width the model operates up to. Params (In Millions) The number of parameters that are tweaked per epoch while training, and processed during inference. FLOPS Floating Point Operations Per Second A measure based on Floating Point Operations that is relevant in the field of Deep Learning. Diminishing results can be observed on the mAP values when compared to the time taken (Speed). Model properties In some circumstances, max precision is essential, and is prioritised over the hardware requirements. This is when a higher model should be chosen. In the scope of this project - the YOLOv8m model has been chosen. The morale behind this choice is to take the advantage of the high mAP value, while not exceeding the time taken too much, in preparation for a future mobile deployment of the model. In comparison of YOLOv5 and YOLOv8 versions - a clear advantage can be seen when taking into account the size of the model (param count), and the resulting mAP output, as well as the time taken. Architecture choice YOLO has been chosen as the architecture that this project utilises for the AI detection. At the start of the project, there was already a high bias towards YOLO due to the highly positive past experience with YOLOv5 and all the incredible features that it offers. Upon release of YOLOv8 and all the superior features and specifications that it provides on top of the previous versions - YOLOv8 was an obvious choice in the architecture that will be used for the project. Description As the name suggests, YOLO focuses on detection of multiple classes in a single "look", which is a single analysis of the entire input image. When compared to many other architectures before YOLO, realistically, no matter how quick the other architectures may be - this is an incredibly superior approach, as other architectures would approach detection by reanalysing the entire image for every single class that the model was trained for - increasing the time taken per detection additively per class. An approach like this may seem too good to be true, and that it should come with signficant cost to the speed and confidence of the model. But when the results are analysed - that could barely be further from the truth. YOLO is an incredibly efficient and accurate architecture. These days most sophisticated architectures approach object detection similarly to YOLO, but YOLO is still a state-of-the-art architecture that continues to improve and grow to this day. Internal AI Object Detection steps Classification Object Detection Segmentation The process of identifying the exact bounding box of the item detected. The Bounding by a box of the classified segments of the image. The identification of a part of an image believe to contain an item of a class the model was trained to detect. Visual examples Resizing Joining up of multiple images to create new ones The reduction of data required to train makes it feasible to train relatively high quality models from data gathered and trained from home. Marking Codes Hardware Raspberry Pi 4 Beaglebone Nvidia Jetson Nano Intel Neural Compute Stick 2 Specifications Processor Base Frequency 700MHz Memory 2GB Specifications Core Count (SHAVE) 16 Advantage Offers computational power through a USB connection - can be used to run Inference on existing devices, such as a laptop. Specifications Specifications Core Count 4 Maximum Frequency 700MHz Resistors and Inductors Capacitors ICs Color coded Number coded Android Phone Specifications depend on the specific device. Widely and easily accessible. The vast majority of mobile phones on the market today have a built-in camera. YOLO You Only Look Once An image detection architecture that the project is based on. CUDA cores provided by the GPU CPU Inference Training Personal Computer Rented Dedicated Server Advantages Disadvantages Advantages Disadvantages Local - Provided a local machine is already owned, it is immediately available. Utilises multiple GPUs - Quicker epoch computations, resulting in quicker training. Cloud based Allows for parallel computing, as opposed to using your personal computer at home. Cloud based - upload and download times Datasets tend to be considerably big in size. A smaller dataset of ~2000 images takes up ~3gb of space. This is not a significant amount of data for a local machine to transfer, but it is a considerable amount for uploading. Cost The bigger the server - the higher the rates become. Cost As opposed to a rented server - acquiring your own machine has the benefit of owning the machine, and being able to use it indefinitely (Or until it eventually breaks.) While the initial cost of acquiring an adequate machine for deep learning is higher than renting a server for a few months, it is a worthwhile long-term investment into a machine that can be used for a variety of casual or intensive tasks. Setup time Setup time Speed Speed When compared to a sophisticated server that runs many GPUs - a local machine will most likely process the training at a slower rate than a dedicated server would. A local machine will likely contain one, maybe two GPUs. Pictures are taken from the machine itself. No upload/download times. Devices Discussion After the training is done, which is usually over the span of 10's, and sometimes 100's of hours, depending on the size of the dataset and the epoch count - Running the trained model for inference only takes time in the range of milliseconds to process a single frame. R-CNN Description Disadvantages Not real-time. On average, takes 47 seconds to process a single frame. Discussion It should be noted that R-CNN has a successor called Fast R-CNN and Faster R- CNN. However, even the fastest of the choices still barely manages 5 frames a second at best. R-CNN, which stands for Region Based Convolutional Neural Networks was released in 2013. As other object detection architectures, R-CNN takes an input image, and outlines bounding boxes where it believes an item of a certain class is present. While 5 frames a second is an impressive and definitely useable result, there are alternative architectures that offer a significant improvement in inference time. Developed by Ross Girshick SSD Description SSD, which stands for Single Shot Detector. SSD was released in 2017 Developed mostly by Max deGroot and Ellis Brown Discussion Offers great framerates of an average of 45 frames per second when tested on a relatively old now graphics card NVIDIA GTX 1060. Disadvantages According to the Git repository, the project was seemingly abandoned about 4 years ago. According to the Git repository, the project was seemingly abandoned about 5 years ago. Discussion One of the most feature-rich, cutting-edge, state-of-the-art and popular architectures that is in use today. The component of a computer where the core computations are processed. An optional component of a computer that is dedicated and optimised in computing graphical tasks. Existing labelling related software offers quality of life features, such as rough auto labelling of the images, which only requires the user to adjust the bounding boxes and confirm their validity, rather than having to define the boxes from start to finish. Inference Example Inference Example Discussion Surprisingly good results for a model trained from 120 images, with confidence values above 0.8 and sometimes over 0.9! Pretrained model used yolov5s Architecture YOLOv5 Architecture Pretrained model used yolov5m YOLOv5 Architecture Pretrained model used yolov5m YOLOv5 Discussion Rather poor results. Confidence values usually below 0.7, struggled to classify accurately. Discussion Great results with confidence values consistently above 0.8, classifying all classes accurately! Technology utilised Deep learning computation with CPU Cores and GPU CUDA Cores running in parallel. 220Ohm resistor example Color codes Red = 2 Brown = 1 Gold = 5% tolerance 100nF capacitor example Unfortunately for the purposes of automatic identification of Integrated Circuit markings, most IC manufacturers do not follow any global standard for marking their ICs. Most manufacturers tend to have their own internal IC marking standards. Due to this fact - only known markings can be used to identify components. Mixed manufacturer ICs example This example illustrates the vast variation and lack of identifiable without access to datasheets markings. Architecture YOLOv5 Architecture A silicon board that has parts of it etched away, with only conductive tracks remaining in specific positions that are pre-planned using a CAD software. Widely used to implement electronic circuits. CAD Computer Aided Design CAD software accelerates and automates designs in various different fields. Instructions are given to a computer that are translated into more complex and intuitive, usually GUI based interactive programs. Electrical current that oscillates. Electrical current that stays constant. An electrical component that emits light when current is passed through the circuit. A resistor that varies in resistance relatively to the amount of light the body of the component is exposed to. A ring light has been added for both training and inference running. Progress Issues encountered A glitch in augmentation provided by YOLOv5, where rotation during augmentation has shifted the bounding boxes of the components, causing inaccurate feedback to the model, preventing it from training appropriately. Actual bounding boxes after rotation augmentation Note the unnecessarily expanded bounding boxes. Description Submitted GitHub issue Link https://github.com/ultralytics/yolov5/issues/10639 Information gathered from replies as of todays date This issue has been reported to be part of YOLOv7 augmentation also. Example Expected bounding boxes after rotation augmentation Note the snug fit of the bounding box around the edges of the component. That is desirable, as it provides accurate information on what the model should be looking for. This will train the model in undesirable ways, detecting parts it should not. Augmentation rotation issue Software Description Epochs Augmentation Training Description Pretrained models Loss function train vs val labels Windows/Linux/Mac Desktop/Laptop Machine Discussion Discussion Specifications depend on the specific device. The specs of a desktop/laptop machine will most likely beat the specs of both a phone, and a microprocessor. Desktops are widely accessible in environments where it would be relevant to use this project, such as the home of the user, or the campus a student is in. Ease of access Ease of access Most people own a mobile device, and have it on them in most cases. Ease of access Due to the device being specialised for neural computations, it is not a common device by any means. Combined with the price tag of ~100 eur, this device will likely only be owned by developers, as opposed to users. As this device is unlikely to be owned by a user of the project, it would not be wise to require owning one to run our inference. Discussion The project will be able to support a compute stick as an alternative to a GPU. The machine must have permissions for USB connections and running the application. The app may be obtained from an App Store, that mobile devices have easy access to, as long as they have access to the internet. Specifications Specifications There are countless types of android devices on the market, all with varying specifications. Camera Platforms The application is designed through Qt Creator. Qt Creator is cross-platform. Cross-Platform Microprocessors USB computation extensions How long specifically is directly tied to the speed of hardware that the model is being ran on, and the size of the model. Even with all the speed optimisations offered by the YOLO family, a lower end device such as a Raspberry Pi 4 may take 1-2 seconds to process a single 360p image. It is important to pick appropriate hardware for your particular use cases. CPU Core Count 4 GPU Maximum Frequency 1.5GHz CPU Core Count 1 Maximum Frequency 1GHz Core Count Maximum Frequency GPU 2 532MHz CPU GPU Core Count 4 Maximum Frequency 1.479GHz Core Count Maximum Frequency 128 921MHz Discussion This microprocessor is targeted towards quick graphical computations, which can instead be used for deep learning. Discussion Conclusion Existing Solutions Problem Background Algorithmic object detection Detection of objects from an image Difficult and unreliable by algorithmic approaches A computer does not know the difference between pixels representing a cat or a dog. There are various cases in which automation of object detection as opposed to having a human constantly observing footage is beneficial. Quality control Security Analysis AI based Object Detection Production In the field of security through digital cameras, object detection is an incredibly useful tool for monitoring potential intruders onto a facility. For facilities that utilise dozens, or even hundreds of cameras - object detection is a very valuable tool that requires minimal human interaction, with high level of certainty, and 24/7 attention. Depending on the security requirements, lower security facilities may not require hiring a person for constant monitoring of the security footage, and offers live notifications for any unexpected activity observed. During production of anything from farm produce, to electronic components - consistent detection and rejection of items with damage is essential. With use of object detection, flaws can be recognised incredibly efficiently, and this information may be passed onto the production line, identifying exactly which item was detected to have flaws, and be discarded automatically, without ever requiring any human interaction. If the detection is sophisiticated enough to be more reliable than a person, this opens up new opportunities for the speed and efficiency of the production, as a computer's computational power may be expanded, unlike a person. Thorough inspection of potential objects on final products is a crucial part of many fields of production. Features that may've developed during the process of manufacture may be detected on the final products. This includes positive, negative, or purely analytical features. Examples Developments of a petri dish colony PCB manufacturing error Potential signs of disease on farms Material production flaws Algorithm based detection can be used to effectively identify very specific criteria, which can be expressed as an analytical value, or a trend. Examples Algorithm based detection can be implemented to reliably detect Specific Color. Specific shapes by following line trends. Must be coherent shapes, does not perform well with partial shapes. Where algorithm based detection will not be reliable Identification of generic objects, such as: trees, cats, dogs, people, cars, etc. Specific patterns. Deep Learning Epoch Inference Deep learning is achieved through what is known as neural networks, which is a complex combination of usually many millions of digital neurons, with analog based values for each neuron. The process of tweaking the entire model based on the existing parameters and the current output that it produces - by use of a loss function, randomness, and clever technique, in hopes of improving the detection ability of the data the model was trained upon. These digital neurons work together to identify the incoming information and produce an output that resembles what the network has learned during the training of the model. Training is usually based off of a pre-trained model, that is trained on a big dataset. Most pre-trained models are trained on the COCO dataset, which is publicly available, and holds a vast amount of data. Training Description Initially, the model parameters are set to a random state. The best model is kept between the newly trained model, and the previous best. Many epochs are ran in order to polish the model as much as possible. The output that it produces when fed input data, is of course, also random. Epochs are ran on the model to train the model.